Assessment of AP, STEMI, NSTEMI and therapy Prescription based on vascular age- A Decision tree approach

نویسنده

  • C. Premalatha
چکیده

AP, STEMI and NSTEMI are the main categories of acute coronary syndrome which causes damage to the coronaries and make the patients prone to high risk of death. Several studies with different technologies have been made in diagnosis and treatment of the events, which includes association rules, logistic regression, fuzzy modeling, and neural network, CART, ID3. The existing techniques are confined to small datasets that are specific to one particular disease and this knowledge mined is not indispensible for classification of risk factors for the events. The implemented methodology uses C4.5 and C5.0 decision tree algorithm for identification of related risk factors by constructing two different decision trees for the events that includes Angina Pectoris, St-elevation Myocardial Infarction and Non-St-Elevation Myocardial Infarction based on attribute selection measure that includes Information Gain, Gain Ratio. Using performance measures, correctly classified values have been found for both the algorithms and accuracy is calculated. The implemented methodologies, C4.5 and C5.0 decision tree algorithm gives high classification accuracy of 86 % and 89.3% compared to the aforementioned existing techniques. Rule based classification technique provides a therapy selection for the events diagnosed, based on the vascular age, which aids the patients in reducing their risk levels and doctors to treat the patient with required therapy instead of angioplasty. Keywords—Classification, Attribute selection measures, Information gain, Gain ratio, C4.5 and C5.0 decision tree algorithm, risk factors, Rule based classification. INTRODUCTION The objective of the implemented system was to develop a data mining system based on decision trees for the assessment of acute coronary syndrome related risk factors targeting in the reduction of the events. Decision-tree-based algorithms give reliable and effective results that provide high-classification accuracy with a simple representation of gathered knowledge, support decisionmaking processes in medicine. Data-mining analysis was carried out using the C4.5 and C5.0 decision tree algorithms extracting rules based on the risk factors (age, sex, FH, SMBEF, SMAFT, TC, TG, HDLM, HDLW, GLU, HXHTN, HXDM, SBP, DBP and LDL) The C4.5 algorithm, which uses the divide-and-conquer approach to decision tree induction, was employed. The algorithm uses a selected criterion to build the tree. It works top–down, seeking at each stage an attribute to split on that which best separates the classes, and then recursively processing the sub problems that result from the split. The C5.0 algorithm boosts the constructed decision tree such that the misclassification error over the classified data is found and removed which results in higher accuracy over classified risk factors identified for the events AP, STEMI, NSTEMI. In the implemented system, the following attribute selection measures were used: Information Gain, Gain Ratio. Based on these attribute selection measures, different decision trees are constructed. Using performance measures, training and testing datasets are compared and accuracy is calculated. Rule based classification technique provides a therapy selection for the events diagnosed, based on the vascular age, which aids the patients in reducing their risk levels and doctors to treat the patient with required therapy instead of angioplasty. Classification of DATASET attributes (risk factors) Dataset Preprocessing CHD diagnosis C4.5 and C5.0 algorithm Therapy prescription Patients original Dataset International Journal of Engineering Research and General Science Volume 2, Issue 2, Feb-Mar 2014 ISSN 2091-2730 57 www.ijergs.org Fig.1. Block Diagram of the Acute coronary syndrome diagnosis system DATASET PREPROCESSING The data preprocessing is the first processing module that analyze data that has not been carefully screened, unscreened data can produce misleading results. If there is much irrelevant and redundant information present or noisy and unreliable data, then knowledge discovery during the training phase is more difficult.Thus, the representation and quality of data is first and foremost before any process. Steps involved in dataset preprocessing are as follows,  Missing values are filled using K-Nearest Neighbor algorithm  Duplications were removed  Data were coded The Steps involved in filling up the missing values are: 1. Determine parameter K = number of nearest neighbors 2. Calculate the distance between the query-instance and all the training samples 3. Sort the distance and determine nearest neighbors based on the K-th minimum distance 4. Gather the values of ‘y’ of the nearest neighbors 5. Use average of nearest neighbors as the prediction value of the query instance and replace the missing field with the Predicted value. If both the row has same value that is, the values duplicated, then any one of the row is removed from the dataset. None of the row is removed if at least one value differs in any column of the tuple. It is done after filling up the missing values in the dataset. if (Rown==Rowm&&Missing values==Nil) then Delete (Rown||Rowm) Else if (Rown==Rowm &&Missing values==found) Apply K-NN Return (Missing value: K-NN value) Repeat until Missing values==Nil If (Rown==Rowm) then Delete (Rown||Rowm) Else Checkout next record Else Return (no duplication found) Data coding is the process of assigning the dataset attribute values to a specified categorical or numerical value. It is done in order to make the representation of risk factors precise and classification to be done efficiently with that simpler representation. Risk factors Coded values Age 30-40: 1 41-50: 2 51-60: 3 60+: 4 Sex Female: F Male: M Family History Yes: Y No: N Diabetes Yes: Y No: N Hypertension Yes: Y No: N Smoking (B/A) Yes: Y No: N Systolic blood pressure N: 120 H>140 L<100 International Journal of Engineering Research and General Science Volume 2, Issue 2, Feb-Mar 2014 ISSN 2091-2730 58 www.ijergs.org Diastolic blood pressure N: 80 H>100 L<70 Total Cholesterol N: 180 H>200 High Density Lipoprotein N: 50-70 H>70 L<40 Low Density Lipoprotein N: 130 H>130 L<130 Triglyceride N: 160 H>160 Glucose N: 100-140 H>145 L<60 Class AP: 1 STEMI: 2 NSTEMI: 3 TABLE I. ORIGINAL DATASET TABLE II. PREPROCESSED DATASET Age Sex FH SMBEF HXHTN HXDM SMAFT SBP DBP TC HDLW HDLM LDL TG GLU CL 65 2 1 1 2 1 2 80 90 200 50 30 80 67 112 1 31 1 1 1 2 1 1 100 80 45 60 50 100 56 110 2 45 1 2 2 2 1 2 149 60 80 70 40 120 100 90 3 45 1 2 2 2 1 2 149 60 80 70 40 120 100 90 3 80 2 2 1 1 1 1 150 ? 190 80 60 23 150 150 3 Age Sex FH SMBEF HXHTN HXDM SMAFT SBP DBP TC HDLW HDLM LDL TG GLU CL 65 2 1 1 2 1 2 80 90 200 50 30 80 67 112 1 31 1 1 1 2 1 1 100 80 45 60 50 100 56 110 2 45 1 2 2 2 1 2 149 60 80 70 40 120 100 90 3 80 2 2 1 1 1 1 150 70 190 80 60 23 150 150 3 International Journal of Engineering Research and General Science Volume 2, Issue 2, Feb-Mar 2014 ISSN 2091-2730 59 www.ijergs.org TABLE III. CODED DATASET CLASSIFICATION OF RISK FACTORS AND CHD DIAGNOSIS The C4.5 algorithm employs a divide-and-conquer approach to construct decision tree. The algorithm uses a selected criterion to build the tree using attribute selection measures that includes Information Gain and Gain Ratio. The attribute producing highest measure thrive to be the root node based on which further splits occur. Finally, it works top–down, seeking at each stage an attribute to split on that which best separates the classes, and then recursively processing the sub problems that result from the split. Input: 1) Training dataset D, which is a set of training observations and their associated class value. 2) Attribute list A, the set of candidate attributes. 3) Selected splitting criteria method. Output: A decision tree. C4.5 decision tree construction module having the following attribute selection measures are to be investigated for training the dataset. 1. Information Gain (IG) Information gain is based on Claude Shannon’s work on information theory. InfoGain of an attribute A is used to select the best splitting criterion attribute. The highest InfoGain is selected to build the decision tree InfoGain(A) = Info(D) − InfoA (D) . . Eq. 1 Where, Info D = − pi log2(pi) i=1 . . Eq. 2 InfoA D = Dj D v j=1 info(Dj) . . Eq. 3 2. Gain Ratio (GR) Gain ratio biases the decision tree against considering attributes with a large number of distinct values. So it solves the drawback of information gain Gain Ratio A = Info Gain A SplitinfoA D . . Eq. 4 Age Sex FH SMBEF HXHTN HXDM SMAFT SBP DBP TC HDLW HDLM LDL TG GLU CL 3 N Y Y N Y N L H H N L N N N 1 1 Y Y Y N Y Y N N N N N H N H 2 1 Y N N N Y N H N N H N H N N 3 4 N N Y Y Y Y H N H H H N H H 3 International Journal of Engineering Research and General Science Volume 2, Issue 2, Feb-Mar 2014 ISSN 2091-2730 60 www.ijergs.org SplitinfoA D = − Dj D v j=1 log2 Dj D . . Eq. 5 Fig.2. Classification of risk factors and CHD Diagnosis Classification of Risk factors using Attribute selection measures for the coded dataset after Preprocessing 1. Information Gain(IG) Calculated for Age Info Gain (A) = Info (D) − InfoA (D) Info D = − 1 4 log2 1 4 − 1 4 log2 1 4 − 2 4 log2 2 4 = 0.4515 InfoA D = 2/4(− 1 4 log2 1 4 − 1 4 log2 1 4 − 0) + 2/4 0 − 0 − 2 4 log2 2 4 = 0.2257 Info Gain A = 0.4515 − 0.2257 = 0.2258 2. Gain Ratio(GR) Calculated for Family history Info Gain A SplitinfoA D = 0.2258 − 2 4 log2 2 4 − 2 4 log2 2 4 = 0.2258 0.3010 = 0.7501 Attribute having highest Gain Ratio is considered to be the root node based on which further classification of risk factors proceeds. The heart disease dataset obtained from UCI Repository contains 250 records in which 150 are considered as training dataset and 100 as testing dataset. Patients consistent Dataset Classification of risk factors using attribute selection measures C4.5 decision tree algorithm Age

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ویژگی های اپیدمیولوژیک و تعیین کننده های مرگ و میر سندرم حاد کرونر در ایران

Abstract Background and purpose: Acute coronary syndrome (ACS) includes ST elevation myocardial infarction (STEMI), non–ST-segment elevation myocardial infarction (NSTEMI), and unstable angina. This study was conducted to determine the characteristics of patients with ACS and determinants of their mortality in Iran. Materials and methods: This study was a hospital-based prospective cohort s...

متن کامل

Guidelines and Performance Measures for the Management of Acute Coronary Syndrome

BACKGROUND: Acute coronary syndrome (ACS) is caused by reduced perfusion of the myocardium and characterized by chest pain. The primary goals of treatment for ACS are to restore blood flow through occluded coronary arteries and prevent recurrent coronary events. Antiplatelet and anticoagulant therapies play a crucial role in the treatment of ACS by interrupting the thrombotic process. OBJECTIVE...

متن کامل

Impact of Age on Risk Factors and Clinical Manifestations of Acute Coronary Syndrome: Observations From the Coronary Care Unit of Sulaimani, Iraq

Background: ST-segment elevation myocardial infarction (STEMI) and non-ST-segment elevation myocardial infarction (NSTEMI ) are common types of acute coronary syndrome which are associated with the risk factors of age, obesity, hypertension, and diabetes. Objective: The present study aimed to examine the effects of age on the risk factors and clinical sym...

متن کامل

Healthcare Utilization of Patients With Acute Coronary Syndrome in Germany

BACKGROUND The aim of this study was to determine the health care utilization of patients with acute coronary syndrome (ACS) of one German statutory health insurance. The utilization of ambulatory services as well as of inpatient rehabilitation should be regarded. Moreover, the study should reveal the prescription of drugs for secondary prevention. Here, patients showing guideline corresponding...

متن کامل

Therapeutic approach in acute coronary syndrome focusing on oral therapy.

In the light of some new information based on clinical evidence, current therapeutic approach to patients with acute coronary syndrome especially focusing on oral therapy is being considered. The initial stage of treatment does not differ greatly among patients with unstable angina pectoris (UA), non-ST-elevation myocardial infarction (NSTEMI), or ST-elevation myocardial infarction (STEMI). It ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014